Steps data analysis
Univariate descriptions - categorical variables
Data table
Graphs
Univariate descriptions - numerical variables
Summary
Confidence intervals
Graphs
Boxplots - numerical
Joint distribution tables
Outliers
Parametric testing
Relationships & correlations
- Residual plots
Regressions
Data problems
Introduction
The variables included in the data set are:
| Field | Description |
|---|---|
| AmountWeek | How many cups of coffee do you typically consume weekly? |
| AmountOutMonth | How frequently do you drink out-of-home per month on average? |
| MoneyCoffee | How much money on average do you estimate you spend on coffee per month? |
| MoneyGroceries | How much on average do you spend on general groceries per month? |
| Machine | How do you brew your coffee at home? |
| Brand change | How often do you switch between coffee brands? |
| Purchase location | Where do you usually purchase your coffee? |
| Supermarket_Positive_Reasons | When you purchase coffee from the supermarket what are your main reasons for doing so? |
| Supermarket_Negative_Reasons | What would be reasons why you would not purchase coffee from the supermarket? |
| Criteria_Type_Coffee | What are your main criteria’s or evaluation points for choosing the type of coffee? |
| KnowledgeCoffee | How would you describe your knowledge level regarding coffee in general? |
| Purchase_Price | I believe that the ____ is important to my decision on which coffee to purchase. |
| Purchase_Sustainability | I believe that the ____ is important to my decision on which coffee to purchase. |
| Purchase_Sustainability | I believe that the ____ is important to my decision on which coffee to purchase. |
| Purchase_Fairtrade | I believe that the ____ is important to my decision on which coffee to purchase. |
| Purchase_Packaging | I believe that the ____ is important to my decision on which coffee to purchase. |
| Frequency_Specialty | How often do you drink specialty coffee? |
| Subscription_Likely | How likely are you to have an online subscription for (specialty) coffee? |
| Subscription_Not_Likely | What is the number one reasons why you would be hesitant? |
| App_Likely | How likely are you to value and use an app for your online subscription? |
| Gender | What is your gender? |
| AgeCategory | What is your age category? |
| Occupation | What is your occupational status? |
| Education | What level of education have you completed? |
| Home | How would you describe the place you currently live in? |
Univariate descriptions - Categorical variables
Age category
| Age Category | Absolute | Relative |
|---|---|---|
| < 18 | 2 | 0.89% |
| 18-25 | 64 | 28.57% |
| 25-45 | 99 | 44.20% |
| 45-60 | 48 | 21.43% |
| > 60 | 11 | 4.91% |
Home
| Home | Absolute | Relative |
|---|---|---|
| Rural (Town) | 23 | 10.27% |
| Suburbs | 18 | 8.04% |
| Urban (City) | 183 | 81.70% |
Gender
| Gender | Absolute | Relative |
|---|---|---|
| Female | 147 | 65.62% |
| Male | 75 | 33.48% |
| Other | 2 | 0.89% |
Education
| Education | Absolute | Relative |
|---|---|---|
| Elementary school | 3 | 1.34% |
| High school | 21 | 9.38% |
| Associate degree | 18 | 8.04% |
| Bachelor’s degree | 122 | 54.46% |
| Master | 56 | 25.00% |
| Phd | 4 | 1.79% |
Machine
| Machine | Absolute | Relative |
|---|---|---|
| Aeropress | 1 | 0.45% |
| CupMachine | 72 | 32.14% |
| Espresso machine | 71 | 31.70% |
| Filter machine | 47 | 20.98% |
| French press | 8 | 3.57% |
| Instant coffee | 5 | 2.23% |
| Moka pot | 16 | 7.14% |
| V60 | 4 | 1.79% |
Brand choose
| Brand choice | Absolute | Relative |
|---|---|---|
| Never | 73 | 32.59% |
| Sometimes | 128 | 57.14% |
| Very often | 20 | 8.93% |
| Every time | 3 | 1.34% |
Purchase Method
| Purchase Method | Absolute | Relative |
|---|---|---|
| E-commerce | 38 | 16.96% |
| Online subscription | 9 | 4.02% |
| Specialty stores or cafés | 27 | 12.05% |
| The supermarket | 150 | 66.96% |
Multiple option answers:
Reasons buying from the supermarket
| Reason | Frequency |
|---|---|
| Convenience | 1 |
| I am satisfied with the product | 39 |
| I do not have special stores near where I live | 8 |
| I do not purchase coffee from the supermarket | 1 |
| Other | 1 |
| Price | 14 |
| Time-saving | 32 |
| Convenience | 49 |
| I am satisfied with the product | 49 |
| I do not have special stores near where I live | 8 |
| I do not purchase coffee from the supermarket | 39 |
| Other | 2 |
| Price | 55 |
| Time-saving | 22 |
Reasons for not buying from the supermarket
| Reason | Frequency |
|---|---|
| Better quality elsewhere | 17 |
| I don’t buy from supermarkets | 1 |
| Lack of sustainable options | 4 |
| Not enough variety | 15 |
| Not wanting to support big cooperations | 8 |
| Better quality elsewhere | 73 |
| I don’t buy from supermarkets | 6 |
| It is not fresh | 15 |
| Lack of sustainable options | 4 |
| No reason | 98 |
| Not enough variety | 13 |
| Not wanting to support big cooperations | 13 |
| Price | 2 |
Criteria for choosing the type of coffee
| Reason | Frequency |
|---|---|
| Better quality elsewhere | 17 |
| I don’t buy from supermarkets | 1 |
| Lack of sustainable options | 4 |
| Not enough variety | 15 |
| Not wanting to support big cooperations | 8 |
| Better quality elsewhere | 73 |
| I don’t buy from supermarkets | 6 |
| It is not fresh | 15 |
| Lack of sustainable options | 4 |
| No reason | 98 |
| Not enough variety | 13 |
| Not wanting to support big cooperations | 13 |
| Price | 2 |
Purchase decisions 1-5
Price
| Purchase decision - price | Absolute | Relative |
|---|---|---|
| 1 | 24 | 10.71% |
| 2 | 54 | 24.11% |
| 3 | 54 | 24.11% |
| 4 | 51 | 22.77% |
| 5 | 41 | 18.30% |
Sustainability
| Purchase decision - sustainability | Absolute | Relative |
|---|---|---|
| 1 | 18 | 8.04% |
| 2 | 36 | 16.07% |
| 3 | 82 | 36.61% |
| 4 | 56 | 25.00% |
| 5 | 32 | 14.29% |
Certificates
| Purchase decision - certificate | Absolute | Relative |
|---|---|---|
| 1 | 42 | 18.75% |
| 2 | 63 | 28.12% |
| 3 | 74 | 33.04% |
| 4 | 34 | 15.18% |
| 5 | 11 | 4.91% |
Fairtrade
| Purchase decision - fairtrade | Absolute | Relative |
|---|---|---|
| 1 | 21 | 9.38% |
| 2 | 35 | 15.62% |
| 3 | 76 | 33.93% |
| 4 | 60 | 26.79% |
| 5 | 32 | 14.29% |
Packaging
| Purchase decision - packaging | Absolute | Relative |
|---|---|---|
| 1 | 68 | 30.36% |
| 2 | 62 | 27.68% |
| 3 | 44 | 19.64% |
| 4 | 36 | 16.07% |
| 5 | 14 | 6.25% |
Combined data
| Importance | Price | Sustainability | Certificates | Fairtrade | Packaging |
|---|---|---|---|---|---|
| 1 | 42 | 24 | 18 | 42 | 21 |
| 2 | 63 | 54 | 36 | 63 | 35 |
| 3 | 74 | 54 | 82 | 74 | 76 |
| 4 | 34 | 51 | 56 | 34 | 60 |
| 5 | 11 | 41 | 32 | 11 | 32 |
Frequency specialty coffee consumption
| Frequency coffee consumption | Absolute | Relative |
|---|---|---|
| I do (did) not know what this is | 53 | 23.66% |
| Never | 40 | 17.86% |
| Only in cafes | 46 | 20.54% |
| Sometimes | 61 | 27.23% |
| Always | 24 | 10.71% |
Reasons for not being likely to set up a subscription
| Reason | Frequency |
|---|---|
| I am happy with my coffee now | 2 |
| I do not consume enough coffee at home | 5 |
| I do not like being stuck with subscriptions | 46 |
| No reason | 3 |
| Other | 3 |
| The packaging that is required for delivery | 10 |
| The price | 42 |
| I already have a subscription | 9 |
| I am happy with my coffee now | 105 |
| I do not consume enough coffee at home | 17 |
| I do not like being stuck with subscriptions | 64 |
| No reason | 11 |
| Other | 1 |
| The packaging that is required for delivery | 4 |
| The price | 13 |
Univariate descriptions - Numerical variables
Amount coffe consumed weekly
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.00 15.50 18.81 25.00 70.00
Amount per month out of house
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 2.000 5.000 8.107 10.000 40.000
Money coffee
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.00 20.00 25.38 35.00 120.00
Money groceries
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0 157.5 200.0 247.3 300.0 900.0
Subscription likely
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.00 3.00 3.79 6.00 10.00
App likely
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 3.000 4.179 7.000 10.000
Boxplots
Parametric testing
H_0 <- There is no association between the two variables.
H_a <- There is a association.
Age - Amount coffee drank
Pearson's Chi-squared test
data: AmountWeek and AgeCategory
X-squared = 254.16, df = 136, p-value = 0.000000003461
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: AmountWeek and AgeCategory
X-squared = 254.16, df = NA, p-value = 0.003992
Education - Amount coffee drank
Pearson's Chi-squared test
data: AmountWeek and Education
X-squared = 236.72, df = 170, p-value = 0.000546
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: AmountWeek and Education
X-squared = 236.72, df = NA, p-value = 0.03992
Gender - Amount coffee drank
Pearson's Chi-squared test
data: AmountWeek and Gender
X-squared = 71.44, df = 68, p-value = 0.3643
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: AmountWeek and Gender
X-squared = 71.44, df = NA, p-value = 0.2894
Home - Amount coffee drank
Pearson's Chi-squared test
data: AmountWeek and Home
X-squared = 68.057, df = 68, p-value = 0.4753
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: AmountWeek and Home
X-squared = 68.057, df = NA, p-value = 0.501
App - Age
Pearson's Chi-squared test
data: App_Likely and AgeCategory
X-squared = 53.162, df = 36, p-value = 0.03254
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: App_Likely and AgeCategory
X-squared = 53.162, df = NA, p-value = 0.04192
Coffee knowledge - Age
Pearson's Chi-squared test
data: KnowledgeCoffee and AgeCategory
X-squared = 151.89, df = 36, p-value = 0.0000000000000003491
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: KnowledgeCoffee and AgeCategory
X-squared = 151.89, df = NA, p-value = 0.001996
Coffee knowledge - Purchase location
Pearson's Chi-squared test
data: KnowledgeCoffee and PurchaseLocation
X-squared = 35.066, df = 27, p-value = 0.1372
Pearson's Chi-squared test with simulated p-value (based on 500
replicates)
data: KnowledgeCoffee and PurchaseLocation
X-squared = 35.066, df = NA, p-value = 0.1537
Relationships
Regressions
Call:
lm(formula = Subscription_Likely ~ KnowledgeCoffee)
Residuals:
Min 1Q Median 3Q Max
-3.8949 -2.2229 -0.5541 2.1059 7.1115
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.88533 0.52658 3.580 0.000421 ***
KnowledgeCoffee 0.33439 0.08723 3.833 0.000165 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.607 on 222 degrees of freedom
Multiple R-squared: 0.06208, Adjusted R-squared: 0.05785
F-statistic: 14.69 on 1 and 222 DF, p-value: 0.0001647
Incl categorical variables as dummies
Cooks distance –> outliers